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Abstract — The stability of sparse signal reconstruction is investi- 
gated in this paper. We design efficient algorithms to verify the 
sufficient condition for unique £1 sparse recovery. One of our 
algorithm produces comparable results with the state-of-the-art 
technique and performs orders of magnitude faster. We show 
that the £\ -constrained minimal singular value (A-CMSV) of 
the measurement matrix determines, in a very concise manner, 
the recovery performance of £\ -based algorithms such as the 
Basis Pursuit, the Dantzig selector, and the LASSO estimator. 
Compared with performance analysis involving the Restricted 
Isometry Constant, the arguments in this paper are much less 
complicated and provide more intuition on the stability of sparse 
signal recovery. We show also that, with high probability, the 
subgaussian ensemble generates measurement matrices with £i- 
CMSVs bounded away from zero, as long as the number of 
measurements is relatively large. To compute the ^i-CMSV and 
its lower bound, we design two algorithms based on the interior 
point algorithm and the semi-definite relaxation. 

Index Terms — l\ -constrained minimal singular value, Basis 
Pursuit, Dantzig selector, interior point algorithm, LASSO esti- 
mator, restricted isometry property, sparse signal reconstruction, 
semidefinite relaxation, verifiable sufficient condition 

I. Introduction 

Sparse signal reconstruction aims at recovering a sparse 
signal x E K" from observations of the following model: 



y = Ax + w, 



(1) 



where A E jjmxn j s ^ measuremen t G r sensing matrix, 
y is the measurement vector, and w E M m is the noise 
vector. The sparsity level A; of a; is defined as the number 
of non-zero components of x. The measurement system is 
underdetermined because the number of measurements in is 
much smaller than the signal dimension n. However, when 
the sparsity level k is also small, it is possible to recover x 
from y in a stable manner. Reconstruction of a sparse signal 
from linear measurements appears in many signal processing 
branches, such as compressive sensing [l]-[3], sparse linear 
regression [4], source localization J5}, ||6), sparse approxi- 
mation, and signal denoising [7]. Model ([TJi is applicable to 
many practical areas such as DNA microarrays [8|-[11|, radar 
imaging fl2)-[14|, cognitive radio fT5) , and sensor arrays |5|, 
|(6), to name a few. 

This paper is motivated by the following considerations. 
When we are given a sensing or measurement system ([TJ, 
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we usually want to know the performance of the system 
before using it, or at least to know whether it works in 
the ideal setting. This involves deriving a verifiable sufficient 
condition for unique recovery and a computationally amenable 
performance measure. Furthermore, in signal processing and 
the applications mentioned in the previous paragraph, we 
usually have the freedom to design the sensing matrix; that is, 
we can choose the best from a collection of sensing matrices. 
For example, in radar imaging and sensor array applications, 
sensing matrix design is connected with waveform design and 
array configuration design. The ^i-CMSV of this paper has 
already been used to optimally design orthogonal frequency 
division multiplexing (OFDM) radar signals for detecting a 
moving target in the presence of multipath reflections |16|. 
We hope the practitioners in similar fields will also find the 
results in this paper useful. 

The contribution of the work is fourfold. First, we design 
two algorithms to verify the sufficient condition that a sparse 
signal can be uniquely recovered using l\ minimization in 
the noise-free setting. By solving multiple linear programs 
efficiently, one of the algorithms produces comparable results 
with the state-of-the-art verification algorithms and performs 
orders of magnitude faster. Second, we derive concise bounds 
on the I2 norm of estimation error for the Basis Pursuit, 
the Dantzig selector, and the LASSO estimator in terms of 
the l\— CMSV. As the third contribution, we demonstrate 
that if the number of measurements m is reasonably large, 
subgaussian random matrices have ^i-CMSVs bounded away 
from zero, with high probability. This implies that at least for 
subgaussian random matrices, the ^-CMSV is as good as the 
restricted isometry constant. Last but not least, we develop 
algorithms to compute the ^i-CMSV for an arbitrary sensing 
matrix and compare their performance. These algorithms are 
by no means the most efficient ones. However, once we 
shift from an optimization problem with a discrete nature 
(e.g., the restricted isometry constant) to a continuous one, 
there are many optimization tools available and more efficient 
algorithms can be designed. 

Many quantities and properties on the sensing matrix A 
have been proposed to guarantee a stable or unique signal 
reconstruction, most notably the Restricted Isometry Constant 
(RIC) (1), (T7) and the Null Space Property (NSP) (18). 
The RIC provides a unified framework to deal with sparse 
signal recovery and has very nice geometrical explanations. 
However, it is known to be very difficulty to compute. There- 
fore, computable bounds on quantities involved in the RIC 
and the NSP are computed using, for example, semi-definite 



programming relaxation fl9), |20|, and linear programming 
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relaxation (2"T). T o the best of the authors' knowledge, the 
algorithms of 1 20 1 and pT) in verifying the sufficient condition 
of unique £i recovery represent the state-of-the-art technique 
in this direction. In our paper, instead of computing a quantity 
(a/c in pll) for various sparsity levels k and checking if it 
is less than 1/2, we directly seek the critical sparsity level 
below which unique recovery is guaranteed. We compare our 
verification algorithms with those in [20| and |21| using nu- 
merical simulations. One of our algorithms performs orders of 
magnitude faster, consumes much less memory, and produces 
comparable results. 

The paper is organized as follows. In Section [II] we present 
the measurement model, three convex relaxation algorithms, 



and the RIC. Section III is devoted to deriving sufficient 
conditions for unique £1 recovery and deriving bounds on 
the recovery errors of several convex relaxation algorithms. 



In Section IV we show that the majority of realizations of the 
subgaussian measurement ensemble have good £\ -CMS Vs. In 
Section|V] we design algorithms to verify unique recovery, and 
compute the ^-CMSV and its lower bound. We compare the 
algorithms' performance in Section VI Section VII summaries 
our conclusions. 

II. The Measurement Model, Reconstruction 
Algorithms, and Restricted Isometry Property 

A. The Measurement Model 

The following measurement model is used throughout the 
paper. Suppose we have a sparse signal x G R n , i.e., x has 
only a few non-zero components. The sparsity level k of x 
is defined as the number of non-zero elements of x, or the 
£0 "norm" of x: k = \\x\\q. We call a vector k— sparse if 
its sparsity level ||x||o < k. For ease of presentation, we 
restrict ourselves to exactly sparse signals in this paper and 
leave approximately sparse signals to future work. 

We observe y G M. m through the following linear model: 



y 



Ax + w. 



(2) 



where A £ R mx ™ i s the sensing/measurement matrix and w G 
K m is the noise/disturbance vector. 

In this paper, we focus on three most renown recovery 
algorithms based on convex relexation: the Basis Pursuit (BP) 
p2| , the Dantzig Selector (DS) (23), and the LASSO estimator 
1 24 



BP: min ||z||i s.t. \\y - Az\\ 2 < e 



(3) 



DS: min IIzHj s.t. \\A T {y - Az)^ < \ n a (4) 



LASSO: min -II y 



Az\\l + X n a\\z\ 



(5) 



Here A„ is a turning parameter, and e and a is a measure of 
the noise level. All these three optimization programs can be 
implemented efficiently using convex programming or even 
linear programming. 

The performance of the BP, the DS and the LASSO, more 
specifically the error bounds on the solutions of these algo- 
rithms, usually involve the incoherence of the sensing matrix 
A. Many quantities are proposed to measure the incoherence 



of the sensing matrix, for example, the Restricted Isometry 
Constant (RIC) J5J, (17) , the Restricted Eigenvalue assumption 
(25), and the Restricted Correlation assumption (26}, among 
others. However, these quantities are very difficult to compute. 
For example, the only known technique to exactly compute the 
RIC is test all its submatrices of certain size. 

We will compare our CMSV based bounds with the RIC 
based bounds. For this purpose, we follow (3), (IT) to define 
the RIC as follows: 

Definition 1 For each integer k G {1, . . . , n}, the restricted 
isometry constant (RIC) 8^ of a matrix A G M. mxn is defined 
as the smallest 8 > such that 



1-8 < 



\Ax\\ 



< 1 



(6) 



holds for arbitrary non-zero k— sparse signal x. 



The RIC has very clear geometrical meanings. A matrix A 
with a small 8^ roughly means that A is nearly an isometry 
when restricted onto all k— sparse vectors. 

Now we cite some of the most renown performance results 
on the BP, the DS, and the LASSO, which are expressed in 
terms of the RIC. Assume x is a fc— sparse signal and x is its 
estimate given by any of the three algorithms; then we have 
the following: 

1) BP (n): Suppose that 8 2k < %/2 - 1 and ||w|| 2 < e. 
The solution to the BP ([3]) satisfies 



'2k- 



1 - (1 + V2)8 



2k- 



2) DS 1 23 1: If the noise w satisfies WA 1 w\ 



(7) 



< A„cr, and 



'2k 



83k < 1. Then, the error signal obeys 



4Vk 



1-8- 



- A„er. 



(8) 



2 k- 



>:SA- 



3) LASSO p7| : Consider the noise-free case. Under the 
condition of incoherence design with a sparsity multi- 
plier sequence e„, the error associated with the LASSO 
estimator x is bounded for sufficiently large n by 



sella < 17.5 • A n <7 ■ 



«XJ 2 



(9) 



Here, the sparsity level k = k n depends on n. Refer 
to 1 27 1 for more information on incoherence design and 
multiplier sequence. 

We note that in these error bounds, the terms involving the 
RIC on the right hand sides are quite complicated. We will 



compare these results with our bounds in Section III which 



are much more concise and whose derivations are much less 
involved. 

Although the RIC provides a measure quantifying the good- 
ness of a sensing matrix, as mentioned earlier, its computation 
poses great challenge. The computation difficulty is compen- 
sated by the nice properties of RIC for a large class of random 
sensing matrices. We cite one general result below |28|: 
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Let A G jjmxn k e a random matrix whose entries 
are i.i.d. samples from any distribution that satisfies the 
concentration inequality for any x g W 1 and < e < 1: 



\Ax\\i- 



> e\\x 



< 2e 



-mc (e) 



(10) 



Then, for any given (5 G (0, 1), there exist constants 
cy , C2 > depending only on 5 such that <5fc < <5, with 
probability not less than 1 — 2e~ C2 "\ as long as 



m > c\k log 



k ' 



(11) 



We remark that distributions satisfying the concentration 
inequality ( flO) include the Gaussian distribution and the 
Bernoulli distribution. For the £i-CMSV, we will establish a 
theorem similar to the one above for the subgaussian ensemble 
with the same bound on m. The subgaussian ensemble in this 
paper includes the Gaussian ensemble, the Bernoulli ensemble, 
as well as the normalized volume measure on various convex 
symmetric bodies , for example, the unit balls of i£ for 
2 < p < oo (29). 

III. Stability of Convex Relaxation based on the 
i\ -Constrained Minimal Singular Value 

In this section, we present two approaches to verify the 
sufficient condition for the uniqueness of l\ -recovery. We also 
derive bounds on the reconstruction errors for the BP, the DS 
and the LASSO. Our bounds are given in terms of the l\- 
CMSV rather than the RIC of matrix A . 

We first introduce a quantity that measures the sparsity, (or, 
more accurately, the density), of a given vector x. 

Definition 2 The iy-sparsity level of a non-zero vector x G 
K™ is defined as 



s(x) 



(12) 



The scaling and permutation invariant s(x) is indeed a 
measure of sparsity. To see this, suppose ||cc||o = k; then the 
Cauchy-Schwartz inequality implies that 



s(x) < k, 



(13) 



and we have equality if and only if the absolute values of 
all non-zero components of x are equal. Therefore, the more 
non-zero elements x has and the more evenly the magnitudes 
of these non-zero elements are distributed, the larger s p (x). In 
particular, if x has exactly one non-zero element, then s{x) = 
1; if x has n non-zero elements with the same magnitude, then 
s(x) = n. 

We use the i±— sparsity level as a tool to relax the necessary 
and sufficient condition for exact l\ recovery in the noise less 
setting [ 30 1 — [ 32 1 . In particular, Zhang showed in p0| that x 
with ||a;||o = k is the unique solution to BP with e = 0: 



if and only if 



min II ;z|| i s.t. Ax = Az 

zGR™ 



(14) 



(15) 



for any z such that Az = and any index set S C {1, . . . , n} 
of size at most k. We are interested in finding k*, the maximal 
k such that the necessary and sufficient condition ( fT5] l is 
satisfied. 

We note that a sufficient condition for exact l\ recovery is 



s(z) > 4fc 



(16) 



for any z G Kcr(A) = {z : Az = 0}. This is because the 
negation of ( |15) : 

3z £ Ker(A) and S with size at most k 
such that \zi\ > ^ \zi\ 



ies 



implies 



kill < 2^1*4 



i£S 



< 



ies 



< 2Vk\\z\ 



2- 



Therefore, the minimization of s(z) over Kei(A) yields a 
lower bound on k*. Unfortunately, this optimization is very 
difficult. In section [V] we present a semidefinite relaxation 
algorithm to obtain a lower bound on k*. 

Another relaxation approach is to replace the £2 norm in 
the definition of the ^-sparsity level with the £ x norm. Note 
that the negation of ( fl5| ) also implies that 



Nli < 2£W 

ies 

< Zfcllzlloo. 

Therefore, the following optimization problem 

1 



mm - 

::Az=0 2 



ll*lll 
Zlloo 



(17) 
(18) 

(19) 



finds a lower bound on the maximal k such that ( fl"5j ) is satis- 
fied. In Section|VJ we will present a polynomial time algorithm 
to solve ( fT9] l. The algorithm solves n linear programs and 
produces results comparable to the best known results in [21 1 
within a much shorter time. 

In the noisy setting, our derivation of the error bounds for 
the BP, the DS, and the LASSO relies heavily on the fact that 
the error vectors have small l\— sparsity levels. 

Now we are ready to define the l\ —constrained minimal 
singular value (CMSV): 

Definition 3 For any s £ [l,n] and matrix A G R mxn , define 
the i\-constrained minimal singular value (abbreviated as l\- 
CMSV) of A by 

\\Ax\\ 2 



Ps(A) 



clef 



XytO, s(x)<s \\X 2 



(20) 



Intuitively, the £\— CMSV p s {A) measures the invertibility 
of the operator A : W 1 1— > W 11 when restricted onto vectors 
with l\ -sparsity level not greater than s. 
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In the following theorem, we present our error bounds in 
terms of the l\— CMSV, whose proof is given in Appendix |A| 

Theorem 1 Suppose the support of the true signal x is of size 
k. 

1) If the noise w is bounded; that is, \\w\\2 < e, then the 
solution x to the BS ([3| obeys 

2e 



x - x\\ 2 



< 



Pik 



(21) 



2) If the noise w in the DS |4} satisfies \\A T w\\ QO < A„er, 
then the solution to (4} obeys 



II <r 4 ^ 

X 2 < — 5— KiO. 
Plk 



(22) 



3) If the noise w in the LASSO |5]l satisfies \\A T w\\ oa < 
n\ n a for some n £ (0, 1), then the solution x to the 
LASSO estimator <|3j obeys 

k 2Vk 



X 2 < 



1 



-A„er. 



(23) 



As shown in Appendix [A] the procedure of establishing 
Theorem [T] has two steps: 

1) For all three algorithms, show that the error vector h = 
x — x is t\— sparse: s(x) < s, where s — 4fc for the BP 
and the DS, and s = 4fc/(l - nf for the LASSO. This 
automatically leads to a lower bound ||v4A|| 2 > p s 1 1 || 2 i 

2) Obtain an upper bound on ||A/i|| 2 or || A/i|| 2 and invoke 
Definition |3] of the ^i-CMSV . 

The derivation is simpler than those employed for obtaining 
the RIC based bounds. 

When the noise w ~ Af(0, a 2 l m ), as shown by Candes and 
Tao in (23], with high probability, w satisfies the orthogonality 
condition 

\w T Aj\ < X n a for all 1 < j < n, (24) 
for A n = \J2 logn. More specifically, defining the event 



we have 



V(E C ) < 



2n ■ (2tt) 



-1/2 p -K/2 



X,, 



(25) 



(26) 



Therefore, with A„ = \/2(l + t) log n, we obtain 

¥(E) > 1 - (vM 1 + t) log n ■ n*) . (27) 

As a consequence, the conditions on noise in Theorem [T] for 
the DS and the LASSO hold with high probability. 

Compared with the RIC bounds 0, |8j, and |9]), our CMSV 
bounds ( f2"T| i, ( [22] ), and ( |23| ) are more concise. Of course, if the 
CMSV p s (A) is not bounded away from zero, these concise 
bounds would not offer much. We will show in Section[jV]that, 
at least for a large class of random matrices, the corresponding 
£i-CMSVs are bounded away from zero with high probability 
if rn > c\k log jr. 



iv. i\ -constrained minimal singular values of 
Random Matrices 

This section is devoted to analyzing the property of the l\- 
CMSVs for the subgaussian ensemble. We employ a recent 
estimate on the behavior of empirical processes involving 
subgaussian random variables [29]. 

Before we turn to the general empirical process result of 
(29l developed by the delicate use of the powerful generic 
chaining idea, we need some notations and definitions. For a 
scalar random variable X, the Orlicz ip 2 norm is defined as 



\x\ 



lp2 



inf < t > : E exp 



t 2 



< 2 



(28) 



Markov's inequality immediately gives that X with finite 

|| X II,/,., has subgaussian tail 



\X\>t) <2e X p(-ct 2 /\\X\\ i , 2 ). 



(29) 



The converse is also true, i.e., if X has subgaussian tail 
cxp(-f7/-T 2 ), then \\X\\^ 2 < cK. 

A random vector X € M n is called isotropic and subgaus- 
sian if E| (X,u) | 2 = HI 2 , and || (X,u) \\^ 2 < L\\u\\ 2 hold 
for any u € R n . A random vector X with independent sub- 
gaussian entries X\,... ,X n is a subgaussian vector because 



(X,u) 



c. 



\ i=i 



< c max \\X, 



(30) 



Clearly, if in addition {Xi} are centered and has unit 
variance, then X is also isotropic. In particular, the standard 
Gaussian vector on K ra and the sign vector with Ltd. 1/2 
Bernoulli entries are isotropic and subgaussian. Isotropic and 
subgaussian random vectors also include the vectors with the 
normalized volume measure on various convex symmetric 
bodies , for example, the unit balls of for 2 < p < 00 [29|. 



We reformulate the l\— CMSV for sensing matrices with 
subgaussian entries using empirical processes. Suppose the en- 
tries of A are Ltd. with subgaussian tails such that E|| AuH 2 , = 
to||m|| 2 for any u £ K™. The rows of A are denoted by 

{af, 
l,\\u 

consequence of 



i = 1,...,to}. Denote U n s = {u e W 
\\l < s}. We note that p s (A/y/m) > 



u = 



1 



sup uen „ 



u T A T Au - 1 



1 m 

— V (a % ,u) 2 - 1 

TO ' " 



< e. 



(31) 



del 



Define a class of functions parameterized by u as T t 
{/«(■) = ( u i ') ; u G ^sl an ^ denote P m the empirical 
measure that puts equal mass at each of the to random 
variables (observations) a%, . . . , a m , i.e., 



Pm(-) 



1 m 

711 ^ ^ 



(32) 
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with S x (-) the dirac measure that puts unit mass at x. 
We realize that {—Y^Li ( a ii u ) 2 } is the empirical process 
{-Pm(/ 2 )}/eJ r s • We slightly abuse notation and use E/ 2 to 
denote E/ 2 (a). Then, our goal is to estimate 



and 



Esup /e ^ |P m (/ 2 )-E/ 2 



{su P/e ^ |P m (/ 2 )-E/ 2 |} 



(34) 



a central topic of the study of empirical processes. 

A key concept in studying general Gaussian processes as 
well as the empirical process {Pm(/ 2 )}/e.F s is the 7 P function 
we are going to define. We need some setup first. For any 
set X, an admissible sequence is a sequence of increasing 
partitions {Qk}k>o of X such that card(Qo) = 1 an d 
card(Qfe) = 2 2 for k > 1. By a sequence of increasing 
partitions, we mean that every set in Qk+i is contained in 
some set of Qk- We will use Qk(x) to denote the unique set 
in partition Qk that contains x. The diameter of Qk(x) is 
denoted by A(Qk(x)). Then we have the following definition 
for 7p function associated with a metric space: 



Definition 4 |34J Suppose (X, d) is a metric space and p > 0. 
We define 



7p 



(X,d) = infsup^^^A^^)), (35) 



k>Q 



where the infimum is taken over all admissible sequences. 

The importance of the 7 p lies in its relationship with the 
behavior of Gaussian process indexed by a metric space when 
the metric is induced by the Gaussian process. More precise, 
suppose {X x } xe x is a Gaussian process indexed by the metric 
space [X, d) with 



d(x,y) = (E(X x -X v f) 1/2 , 



then we have 



crf 2 (X,d) < Esup^X, < C l2 {X,d) 



(36) 



(37) 



for some numerical constants c and C. The upper bound (the 
generic chaining bound) was first established by Fernique p5| 
and the lower bound is obtained by Talagrand using majorizing 



measures [36]. The rather difficult concept of majorizing 
measures has been considerably simplified through the notion 
of "generic chaining", an idea that dates back to Kolmogorov 
and is greatly advanced in recently years by Talagrand (34). 
With these preparations, we present the major result of [29]: 



Theorem 2 p9l Let {a,a^i = l,...,m} C R™ be i.i.d. 

random vectors which induce a measure [i on K™, and T be a 
subset of the unit sphere of ' Li{W n , /i) with diam(J r , || • ||^ 2 ) = 
a. Then there exist absolute constants c\, C2,c^ such that for 
any e > and m > 1 satisfying 

a 2 7l(^",ll • Ik) 



with probability at least 1 — exp(~C2£ 2 m/a 4 ), 



sup /e ^ 



1 m 

771 ^ ^ 



E/ 2 (a) 



k=l 



< €. 



(39) 



(33) Furthermore, if T is symmetric, we have 



Esup f 



m * — » 



k=l 



< C3 max < a 



72 C^, II • IU 2 ) 71(^,11 • life) 



(40) 



We apply Theorem [2] to estimate the ^i-CMSV. Consider 
the function set T = T s = {/«(■) = (u,-) : ||u||| = 
1) IMIi < s }- Assume a g R™ is isotropic and subgaussian. 
As a consequence of the isotropy of a and 1 1 it 1 1 2 = 1, we get 
T s is a subset of the unit sphere of L,2(M. n , p). The symmetry 
of JF S yields 

a = diam(J" s , || • ||^ a ) 

= 2sup M ^„E(M,a) 2 = 2. (41) 

Now the key is to compute ^{J^s, \\ • life)- Due to ( |37| i, the 
problem reduces to the computation of Esup ugWn X u (actually 
an upper bound suffices), where {X M } U £?<" is the canonical 
Gaussian process: 

X u = (g,u) , g~Af(Q,I n ),ue U n s . (42) 

Clearly, we have 

72 (^"s, || • life) < c Esup„ gw „ (g,u) 

< cEllullillflfHoo 

< c y/slogn. (43) 

As a consequence, we have the following theorem: 

Theorem 3 Let the rows of the sensing matrix A be i.i.d. 
subgaussian and isotropic random vectors. Then there exists 
constants C\,C2, C3 such that for any e > and m > 1 
satisfying 

s log n 



we have 



m > Ci 



E|l-p a (A)| < c 2 e, 



and 



[1 - e < p s (A) < 1 + e} > 1 - exp(-c 3 e 2 m) 



(44) 
(45) 
(46) 



m > c\- 



(38) 



This theorem says that at least for subgaussian ensembles 
(including the Gaussian ensemble and the Bernoulli ensemble), 
the £i-CMSV bounds are as tight as the RIC bounds. 

V. Computation of the £i-CMSVs 

In this section, we first describe two algorithms to compute 
a lower bound on the maximal k such that the sufficient 
condition ( fT5j ) is satisfied. This gives a way to verify that the 
l\ recovery is exact in the noiseless setting. The second part 
of this section is devoted to the computation of £i-CMSV and 
its lower bound. 
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A. Verifying the Sufficient Condition for Exact t\ Recovery 

Using the ^i-sparsity level to verify the sufficient condition 
( fl"5j > (refer to Section III I is formulated as the following 
optimization problem: 



mm - - 

z:Az=0 4 



ir 



or equivalently, 



max4||z||, s.t. Az = 0, 



< 1. 



(47) 



(48) 



Unfortunately, this later optimization problem, which maxi- 
mizes the I2 norm over a polyhedron, is NP-hard [37]. By 
defining Z = zz T and dropping the rank constraint, we 
instead use the following semidefinite relaxation to produce 
a lower bound: 



(Lo) : max 4trace(Z) 
z-.z^o 

s.t. tr&ce(AZA T ) = 0, ||Z||i < 1, 



(49) 



where ||^||i is the sum of absolute values of all elements in 
Z. 

Another relaxation based on the norm is to solve the 
following optimization problem (refer to (fl9|)): 

(Loo): max2||z|U s.t. Az = 0,||z||i < 1, (50) 



which is solved by the following n linear programs: 
max2zi s.t. Az = 0, ||z||i < l,i = 1, • • • ,n. 



(51) 



We observe this linear program subproblem is actually the dual 
of the linear program subproblem used in pT) to compute 
ai(A,/3) when /3 = 00. In this paper, the linear program 
subproblems are implemented using the primal-dual algorithm 
detailed in Chapter 11 of |38|. This algorithm produces results 
comparable to those in pT| but is significantly faster. 



B. Computing l\-CMSV 

An advantage of using the ^i-CMSV as a measure of the 
"goodness" of a sensing matrix is the relative ease of its 
computation. The computation of £i-CMSV is equivalent to 



mm 



\Ax\ 



s.t. \\x\ 



< 



% \\x\\ 2 = 1. 



(52) 



Unfortunately, the above optimization is not convex because 
of the £2 constraint ||a;||2 = 1- However, many tools at our 
disposal can deal with the continuous problem ( |52| , for 
example, the Lagrange multiplier or the Karush-Kuhn-Tucker 
condition [39]. We will present an interior point algorithm to 
directly compute an approximate numerical solution of ( f52] >. 
Since the optimization problem ( f52~) > is not convex, there is 
no guarantee that the solution of the algorithm are the true 
minima. Thus, we will also present a convex program to 
compute a lower bound on ^i-CMSV. 

The interior point (IP) method provides a general approach 
to efficiently solve the following general constrained optimiza- 
tion problem: 



min F(z) s.t. f(z) < 0, g(z) = 0. 



(53) 



The basic idea is to construct and solve a sequence of 
penalized optimization problems with equality constraints: 



^^log(cTi) 



mhiF(z) 

Z,<T 

s.t. f(z)+a = 0, g{z) = 



(54) 



By using either a Newton step, which tries to solve the Karush- 
Kuhn-Tucker equations [39], or a conjugate gradient step using 
trust regions to solve the penalized problem < [54| ) in each 
iteration, the interior point approach efficiently generates a 
sequence of solutions that converge to the solution of ( f53] l. 
Refer to |4()| |42| for more information on this approach. 

However, the interior point approach assumes that the 
objective and constraint functions have continuous second 
order derivatives, which is not satisfied by the constraint 
II z Hi ~~ \A — 0- We address the non-differentiability of 
f(z) = \\z\\i — y/s by defining z = z + — z~ with 
z + = max(z, 0) > and z~ = max(— z, 0) > 0, which 
leads to the following augmented optimization: 



IP: 



min (z + - z ) T A T A(z + - z ) 

2 + ,z-GR™ 



subject to zf + z i — s < 0, 



(z+ - z-y (z+ - z-) = 1, 

z + > 0, z~ > 0. 



(55) 



This algorithm is employed in |16| to design the transmitting 
waveform of an OFDM radar for optimal detection and 
estimation performance. 

We briefly describe a semidefinite relaxation (SDR) ap- 
proach to compute a lower bound on £i-CMSV. A similar 
method was employed in fl9) to compute an upper bound on 
sparse variance maximization using the lifting procedure for 



semidefinite programming [43|-[45|. Defining Z = zz and 



dropping the rank constraint transform problem ( |52] i into the 
following form: 



SDR: min 
s.t. 



tia,ce(A T AZ) 
\\Z\\i < s,trace(Z) 



1. 



(56) 



Now SDR is a semidefinite programming problem. For a small 
size problem, a global minimum can be achieved at high 
precision using SEDUMI (46}, SDPT3 (47) or CVX (48). 
However, for relatively large n, the interior point algorithm 
makes the memory requirement prohibitive (see [19] for more 
discussion). In this paper, we do not consider more efficient 
implementations of the SDR. 

VI. Numerical Simulations 

We use numerical simulations to assess the effectiveness and 
efficiency of the algorithms presented in Section [V] Except 
for the JN algorithm in Table IV all other experiments were 
performed on a platform with a Pentium D CPU@3.40GHz, 
2GB RAM, and a Windows XP operating system. 



7 



A. Verification of Sufficient Conditions 

We first examine the L 2 ( |49] > and ( |50l > algorithms for 
verifying the sufficient condition ( [15) , These two algorithms 
are compared with the two algorithms d'AE and JN proposed 
in p0| and pT) , respectively. We name the two algorithms 
d'AE and JN using the abbreviations of the authors' names. 
Recall that k* is defined as the maximal k such that ( fT5j ) is 
satisfied. In Table ||J we show the lower bounds on k* for a 
small size Bernoulli matrix with n — 40 computed by L 2 , L^, 
d'AE and JN. The algorithms of d'AE and JN are provided 
by the authors for free download online. 

TABLE I: Comparison of different verification algorithms for 
a Bernoulli matrix with leading dimension n = 40. 



TABLE IV: Comparison of L^ and JN for Gaussian and 
Hadamard matrices with leading dimension n = 1024. In 
the column head, "G" represents Gaussian matrix and "H" 
represents Hadamard matrix. 



m 


lower bounds on k* 


CPU time (s) 


WH) 


Loo(G) 


JN(G) 


Loo(H) 


Loo(G) 


JN(G) 


102 


3 


2 


2 


182 


136 


457 


204 


4 


4 


4 


501 


281 


1179 


307 


6 


6 


6 


872 


510 


2235 


409 


8 


7 


7 


1413 


793 


3659 


512 


11 


10 


10 


1914 


990 


5348 


614 


14 


12 


12 


1362 


1309 


7156 


716 


18 


15 


15 


1687 


1679 


9446 


819 


24 


20 


21 


1972 


2033 


12435 


921 


37 


29 


32 


2307 


2312 


13564 



m 


lower bounds on k* 


CPU time (s) 


Li 


Loo 


d'AE 


JN 


Li 


Loo 


d'AE 


JN 


20 


1 


1 


2 


1 


14.29 


0.41 


1040.40 


4.59 


24 


1 


2 


2 


2 


16.11 


0.38 


694.20 


0.69 


28 


2 


3 


3 


3 


15.12 


0.37 


710.90 


16.20 


32 


2 


3 


4 


3 


15.43 


0.37 


894.08 


2.45 



TABLE II: Comparison of L^ and JN for a Hadamard matrix 
with leading dimension n = 256. 



m 


lower bounds on k* 


CPU time (s) 


Loo 


JN 


Loo 


JN 


25 


1 


1 


3 


35 


51 


2 


2 


6 


70 


76 


3 


3 


7 


102 


102 


4 


4 


9 


303 


128 


5 


5 


9 


544 


153 


7 


7 


13 


310 


179 


9 


9 


15 


528 


204 


12 


12 


18 


1333 


230 


19 


18 


18 


435 



TABLE III: Comparison of L^ and JN for a Gaussian matrix 
with leading dimension n = 256. 



m 


lower bnd on k* 


CPU time (s) 


Loo 


JN 


Loo 


JN 


25 


1 


1 


6 


91 


51 


2 


2 


8 


191 


76 


3 


3 


10 


856 


102 


4 


4 


13 


5630 


128 


4 


5 


16 


5711 


153 


6 


6 


20 


1381 


179 


7 


7 


24 


3356 


204 


10 


10 


25 


10039 


230 


13 


14 


28 


8332 



In the next set of experiments, we compare lower bounds 
on k* computed by L^ and JN, respectively, for n = 256. 
In this case, both the semidefinite relaxation in this paper and 
that in |20| are too time and memory consuming to compute. 
The lower bounds and execution times are shown in Table [TT] 
and [Hi] for a Hadamard and a Gaussian matrix, respectively. 

Table |IV] shows the results of L^ and JN for a Hadamard 
matrix with leading dimension n = 1024. Note the lower 



bounds computed by JN and the CPU times of JN in table 
IV are extracted from pT) . We were not able to carry out the 
computation of JN within reasonable time in our platform. 

From Table |IJ we see that for n = 40, d'AE performs the 
best, and and JN give exactly the same results. However, 
d'AE is very slow in general. For example, even with a first 
order implementation, the d'AE takes more than 37 hours for 
matrices of size 350 x 500, while the L^ takes less than 
3 minutes to finish the computation. From Table [IT III and 



IV we see that L m and JN produces comparable results. 
However, our L^ algorithm performs much faster than the JN 
algorithm. Because the two algorithms solve n linear program 
subproblems that are dual to each other, they should yield 
exactly the same results. However, we observe that sometimes 
the upper bound and lower bound on the lower bound of k* 
computed by JN does not coincide. The difference in speed 
might come from the implementation. Our implementation 
of the linear sub-program employs the primal-dual approach 
detailed in [38] while J2"T| uses the commercial LP solver 
mosekopt J49). 



B. Computation of l\-CMSV 

We next report the test results of the IP and SDR algorithms 
for computing the £i-CMSV and its lower bound, respectively. 
The interior point algorithms IP is implemented using the 
MATLAB® function fmincon. The SDR is solved using CVX 
(48j with the default SDPT3 solver. 

We fist test IP and SDR on a Gaussian matrix A e K 20x60 
for s = 5. Due to the existence of local minima, we need to run 
IP several times and select the minimal function value among 
all the trials as the ^i-CMSV. Fifty random initial points on 
the unit sphere in M 60 are generated for IP. The SDR only 
runs once. The results are shown in Table [V] In this example, 
SDR over-relaxes the problem and produces a zero ^i-CMSV. 

TABLE V: Comparison of IP and SDR for a Gaussian matrix. 





m'mF(z*) 


mca,nF(z*) 


stdF(z') 


mean time (s) 


IP 


0.0666 


0.7133 


0.3661 


2.8903 


SDR 


0.0000 


0.0000 


N/A 


53.1583 
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TABLE VI: IP for a Bernoulli matrix A € R 



| min F(z* ) | mcanF(z*) 


stdF(z*) 


mean time (s) 


IP | 0.000371 | 0.007472 


0.004545 


123.6456 



I constrained singular value for Bernoulli matrix: n = 60 
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Fig. 2: £i-CMSV p s and its bound as a function of m for 
Bernoulli matrix with n = 60 and s = 5, 10, 20. 



Fig. 1: ^-CMSV p s and its bound as a function of s for 
Bernoulli matrix with n — 60 and m = 10, 20, 40. 



The IP is also tested for a Bernoulli matrix A G 



p 50x500 



with results shown in Table VI The CVX implementation of 
SDR takes too much memory to run for n = 500. 

We compare the ^-CMSVs p s and their bounds as a 
function of s computed by IP and SDR, respectively, for 
Bernoulli random matrices. We consider a small-scale problem 
with n = 60 and m = 10,20,40. A matrix B G M 40x60 
with entries {+1,-1} following | Bernoulli distribution is 
generated. For m = 10, 20, 40, the corresponding Bernoulli 
matrix A is obtained by taking the first m rows of B. The 
columns of A are then normalized to have unit norm. The 
normalization implies that p s < p\ = 1. The IP uses 30 
random initial points. As illustrated in Figure[T] the ^i-CMSVs 
and their bounds decrease very fast as s increases. For fixed 
s, increasing m generally (but not necessarily, as shown in 
Figure |2]i increases the £i-CMSV and their bounds. 

In Figure [2] the ^-CMSV p s is plotted as a function of 
m with varying parameter values: s = 5, 10 and 20. With 
s fixed, the two algorithms (IP and SDR) are run for A G 
M. mxn , with m increasing from 2s to n = 60. For each m, 
the construction of A follows the procedure described in the 
previous paragraph. The discrete nature of adding rows to A 
while increasing m makes the curves in Figure |2]not as smooth 
as those in Figure [T] The p s increases with m in general, but 
local decreases do happen. The gap between values computed 
by IP and SDR is also clearly seen for medium s. 

VII. Conclusions 

In this paper, a new measure of a sensing matrix's inco- 
herence, the ^i-CMSV, is proposed to quantify the stability 
of sparse signal reconstruction. It is demonstrated that the 



reconstruction errors of the Basis Pursuit, the Dantzig selector, 
and the LASSO estimator are concisely bounded using the 
^i-CMSV. A generic chaining argument shows that the l\- 
CMSV is bounded away from zero with high probability 
for the subgaussian ensemble, as long as the number of 
measurements is relatively large. One interior point program 
and one semidefinite program are presented to compute the 
^i-CMSV and its lower bound, respectively. Numerical sim- 
ulations assess the algorithms' performance. The ^i-CMSV 
provides a computationally amenable measure of incoherence 
that can be used for optimal design. 

As a by product, two algorithms are designed to verify the 
sufficient conditions guaranteeing the uniqueness of l\ -based 
recovery. The relaxation based algorithm is shown to pro- 
duce comparable results with the state-of-the-art algorithms, 
and performs much faster. 

Appendix A 
Proof of TheoremQ] 

In this appendix, we derive the error bounds presented in 
Theorem Q] 

Proof of Theorem [7J We strictly follow the two-step 
procedure expounded in Section III 



1) In order to establish the ^i-sparsity of the error vector in 
the first step, we suppose S = supp(cc) and |5| = ||a;||o = k. 
Define the error vector h = x — x. For any vector z G K™ 
and any index set S C {l,...,n}, we use zg £ M) s '\ to 
represent the vector whose elements are those of z indicated 
by S. 

We first deal with the BP and the DS. As observed by 
Candes in |17|, the fact that ||x||i = ||a; + h||i is the minimum 



among all z satisfying the constraints in Q and together 
with the fact that the true signal x satisfies the constraints as 
required by the conditions imposed on the noise in Theorem 
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[T| imply that || /i^e 1 1 1 cannot be very large. To see this, we 
observe that 

Hi > ||sb + J»||i 

= ^2\xi + hi\ + ^2 \x { + hi\ 

ies ies c 

> HflBslll-ll^lll + II^S'lll 

= ||aB||i-||/i 5 ||i + ||hs.||i. (57) 
Therefore, we obtain ||/is<=||i < which leads to 

||h||i - + 

= nhsh 

< 2Vk\\h s \\ 2 

< 2Vk\\h\\ 2 , (58) 

where for the next to the last inequality we used the Cauchy- 
Schwart inequality. Inequality ( |58) l is equivalent to 

s(h) < 4k. (59) 



We now continue to establish the l\ sparsity of the error 
vector for the LASSO Q. We borrow ideas from |50| (see 
also 1 25 1). Since the noise w satisfies ||A T «;|| 00 < n\ n a for 
some small n > and a; is a solution to (BJ, we have 

~\\Ax - + A„cr||5;||i < ~\\Ax - y\\\ + X n a\\x\\i. 



Therefore, an argument similar to the one leading to ( |58| > yields 



Consequently, substituting y = Ax + w yields 

A„cr||x||i < 



^\\Ax- y\\l - -\\Ax - yf 2 + A n cr||x||i 



2HH2 - \\\Mx - x) - w\\i + \naWx\U 

1„ 



w 



\l-\\\A{x-x) 



1 



(A(x - x),w) - ^\\w\\l + A n <r||a;||i 

< (A(x-x),w) + X n a\\x\\ 1 
= (x — x,A T w} + \ n o\\x\\i. 

Using the Cauchy-Swcharz type inequality, we get 

A„Cr||i;||l < \\X - xWx^wWaa + A„Cr||as|| ! 

= kA„ct||/i||i + A„cr||a;||i, 

which leads to 

||A||i < k||/i,||i + ||sc||i. 
Therefore, similar to the argument in (|57j> we have 
|| as 11 1 

\\x + h S r- +h s \\i -K(\\h S c + h s \\i) 
\\x + h S c\\i - \\h s \\i -K(\\hso\\i + \\h s \\i) 

= \\x\\i + (l-K)\\h S o\\l-Q. + K)\\hs\\l, 

where S — supp(a;). Consequently, we have 



> 



> 



IMIi < 



1 

T— k 1 



|h 



\\h\W < Vk\\h\\ 2 , 

L — K 



or equivalently, 



s(h) < 



4k 



(60) 



(61) 



2) We now turn to obtain an upper bound on || Ah||2- For the 
BP d5), this is trivial because both x and x satisfy constraint 
|| y — Az\\ < e in ([3]). The triangle inequality yields 



\\Ah\\ 2 = \\A(x-x)\\ 2 

< \\Ax - 2/H2 + ||v - Ax\\ 2 

< 2e. 

It then follows from Definition [3] that 

Pik\\h\\ 2 < ||Aft,|| 2 < 2e. 

Hence, we get 



|X-JC||2 < 



2e 

P4k 



(62) 



(63) 



(64) 



For the DS Q, as shown in [23 1, the condition on noise 
||^4 T ?a>||oo — A n c and the constraint in the Dantzig selector 
Q yield 

\\A T Ah\\ OQ < 2X n a 



(65) 



because 



Aj(w -r) = Af[(y - Ax) - (y - Ax)} 
= Aj(Ax - Ax) = AjAh, (66) 

where f = y — Ax is the residual corresponding to the Dantzig 
selector solution x. Therefore, we obtain an upper bound on 
|| Ah || 2 as follows: 

n 

h T A T Ah = \Y^hi{A T Ah)i\ 

i=l 
n 

< ^i^i-K^h),! 

< 2A„a||h||i. (67) 

Equation $67) , the definition of p 4 k, and equation ( |58] > together 
yield 

^ fe ||h||| < h T A T Ah 

< 2A n <7||h||i 

< 4A„vW||h|| 2 . (68) 

We conclude that 



\x-x\\ 2 < —3— X n a. 

Pik 



(69) 



si- 



to 



Now we establish an upper bound on || AhWl for the LASSO 
<[5j using a procedure similar to the one used for the DS given 
above. We need to establish a bound on 



\A T Ah\\ 



< \\A T (y-Ax)\\ 00 + \\A T ( V -Ax)\\ 

< \\A T w\\ 00 + \\A T (y- Ax)^ 
= n\ n a + \\A T {y - Ax)^. 



(70) 



We again follow the procedure in [50] (see also (25]) to 
estimate ||A T (y — Aai) 1 1 2- Since x is the solution to Q, the 
optimality condition yields that 



A T (y- Ax) e A„cr5||&||i, 

where <9||ii||i = [—1, 1]" is the subgradient of 
at x. 

As a consequence, we obtain 

\\A T (y- Ar)||oo < A„cr. 
Following the same lines in ((67]), we get 

l- 



(71) 
|i evaluated 



\Ah\\l<(K + l)X n a\\h\ 



(73) 



Then, Equation Q, ((70]) and ((72]) 



P 2 ^*\\h\\ 2 2 <\\Ah\\l 



< (k+1)A„ct 



4k 



1 



As a consequence, we get 
1 

as - as 2 < 



2Vk 



1 



« P 4k 



X n a. 



(74) 



(75) 
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